Search CORE

20 research outputs found

Weighted Distance-Based Models for Ranking Data Using the R Package rankdist

Author: Qian Zhaozhi
Yu Philip L. H.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 31/07/2019
Field of study

rankdist is a recently developed R package which implements various distance-based ranking models. These models capture the occurring probability of rankings based on the distances between them. The package provides a framework for fitting and evaluating finite mixture of distance-based models. This paper also presents a new probability model for ranking data based on a new notion of weighted Kendall distance. The new model is flexible and more interpretable than the existing models. We show that the new model has an analytic form of the probability mass function and the maximum likelihood estimates of the model parameters can be obtained efficiently even for ranking involving a large number of objects

Journal of Statistical Software

TRIAGE: Characterizing and auditing training data for improved regression

Author: Crabbé Jonathan
Qian Zhaozhi
Seedat Nabeel
van der Schaar Mihaela
Publication venue
Publication date: 29/10/2023
Field of study

Data quality is crucial for robust machine learning algorithms, with the recent interest in data-centric AI emphasizing the importance of training data characterization. However, current data characterization methods are largely focused on classification settings, with regression settings largely understudied. To address this, we introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We operationalize the score to analyze individual samples' training dynamics and characterize samples as under-, over-, or well-estimated by the model. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings. Additionally, beyond sample level, we show TRIAGE enables new approaches to dataset selection and feature acquisition. Overall, TRIAGE highlights the value unlocked by data characterization in real-world regression applicationsComment: Presented at NeurIPS 202

arXiv.org e-Print Archive

Recommended from our members

Between-centre differences for COVID-19 ICU mortality from early data in England.

Author: Alaa Ahmed M
Ercole Ari
Qian Zhaozhi
van der Schaar Mihaela
Publication venue: Intensive Care Med
Publication date: 01/09/2020
Field of study

Since the first cases in November 2019, the spread of SARS-CoV-2 infections has placed unprecedented strain on healthcare. The intensive care unit (ICU) is of particular concern as large numbers of patients with severe respiratory complications mean that in some areas, ICUs have been completely overwhelmed [1]

Apollo (Cambridge)

Neural Laplace Control for Continuous-time Delayed Systems

Author: Holt Samuel
Hüyük Alihan
Qian Zhaozhi
Sun Hao
van der Schaar Mihaela
Publication venue
Publication date: 10/04/2023
Field of study

Many real-world offline reinforcement learning (RL) problems involve continuous-time environments with delays. Such environments are characterized by two distinctive features: firstly, the state x(t) is observed at irregular time intervals, and secondly, the current action a(t) only affects the future state x(t + g) with an unknown delay g > 0. A prime example of such an environment is satellite control where the communication link between earth and a satellite causes irregular observations and delays. Existing offline RL algorithms have achieved success in environments with irregularly observed states in time or known delays. However, environments involving both irregular observations in time and unknown delays remains an open and challenging problem. To this end, we propose Neural Laplace Control, a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner--and is able to learn from an offline dataset sampled with irregular time intervals from an environment that has a inherent unknown constant delay. We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.Comment: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR: Volume 206. Copyright 2023 by the author(s

arXiv.org e-Print Archive

Retrospective cohort study of admission timing and mortality following COVID-19 infection in England.

Author: Alaa Ahmed
Benger Jonathan
Qian Zhaozhi
Rashbass Jem
van der Schaar Mihaela
Publication venue: BMJ Open
Publication date: 01/11/2020
Field of study

OBJECTIVES: We investigated whether the timing of hospital admission is associated with the risk of mortality for patients with COVID-19 in England, and the factors associated with a longer interval between symptom onset and hospital admission. DESIGN: Retrospective observational cohort study of data collected by the COVID-19 Hospitalisation in England Surveillance System (CHESS). Data were analysed using multivariate regression analysis. SETTING: Acute hospital trusts in England that submit data to CHESS routinely. PARTICIPANTS: Of 14 150 patients included in CHESS until 13 May 2020, 401 lacked a confirmed diagnosis of COVID-19 and 7666 lacked a recorded date of symptom onset. This left 6083 individuals, of whom 15 were excluded because the time between symptom onset and hospital admission exceeded 3 months. The study cohort therefore comprised 6068 unique individuals. MAIN OUTCOME MEASURES: All-cause mortality during the study period. RESULTS: Timing of hospital admission was an independent predictor of mortality following adjustment for age, sex, comorbidities, ethnicity and obesity. Each additional day between symptom onset and hospital admission was associated with a 1% increase in mortality risk (HR 1.01; p<0.005). Healthcare workers were most likely to have an increased interval between symptom onset and hospital admission, as were people from Black, Asian and minority ethnic (BAME) backgrounds, and patients with obesity. CONCLUSION: The timing of hospital admission is associated with mortality in patients with COVID-19. Healthcare workers and individuals from a BAME background are at greater risk of later admission, which may contribute to reports of poorer outcomes in these groups. Strategies to identify and admit patients with high-risk and those showing signs of deterioration in a timely way may reduce the consequent mortality from COVID-19, and should be explored

Directory of Open Access Journals

UWE Bristol Research Repository

Apollo (Cambridge)

Clairvoyance: A Pipeline Toolkit for Medical Time Series

Author: Bica Ioana
Ercole Ari
Jarrett Daniel
Qian Zhaozhi
van der Schaar Mihaela
Yoon Jinsung
Publication venue
Publication date: 28/10/2023
Field of study

Time-series learning is the bread and butter of data-driven *clinical decision support*, and the recent explosion in ML research has demonstrated great potential in various healthcare settings. At the same time, medical time-series problems in the wild are challenging due to their highly *composite* nature: They entail design choices and interactions among components that preprocess data, impute missing values, select features, issue predictions, estimate uncertainty, and interpret models. Despite exponential growth in electronic patient data, there is a remarkable gap between the potential and realized utilization of ML for clinical research and decision support. In particular, orchestrating a real-world project lifecycle poses challenges in engineering (i.e. hard to build), evaluation (i.e. hard to assess), and efficiency (i.e. hard to optimize). Designed to address these issues simultaneously, Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical standard, and (iii) interface for optimization. Our ultimate goal lies in facilitating transparent and reproducible experimentation with complex inference workflows, providing integrated pathways for (1) personalized prediction, (2) treatment-effect estimation, and (3) information acquisition. Through illustrative examples on real-world data in outpatient, general wards, and intensive-care settings, we illustrate the applicability of the pipeline paradigm on core tasks in the healthcare journey. To the best of our knowledge, Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML

arXiv.org e-Print Archive

Recommended from our members

Clairvoyance: A Pipeline Toolkit for Medical Time Series.

Author: Bica Ioana
Ercole Ari
Jarrett Daniel
Qian Zhaozhi
Schaar Mihaela van der
Yoon Jinsung
Publication venue: https://openreview.net/group?id=ICLR.cc/2021/Conference
Publication date: 01/01/2021
Field of study

Apollo (Cambridge)